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Notes for the Improvement of the Spatial and 
Spectral Data Classification Method 


I. INTRODUCTION 

A. Back around 

In the corresponding part of a recent report by the author* a 
detailed explanation was given for the present interest in non-supervlsed 
techniques for the automatic classification of satellite multispectral 
ground scene data vis a vis the techniques involving supervised compu- 
tation. A familiarity with Jayroe's^ report in this area is also presup- 
posed. 

B. Present Situation 

The author was asked to make a theoretical evaluation of Su's^ 
and Jayroe's 2 quite different approaches to non-supervlsed classification 
of satellite multispectral ground scene data. The author chose to do 
that effort in three separate steps: (1) to evaluate Su's^ model first 

independently of Jayroe's model and to suggest any likely improvements 
which would retain the same general idea of the approach, (2) to do the 
same for Jayroe's^ model, and (3) after seeing the effects of the changes 
by processing some data with the resulting revised algorithms, to propose 
what new model might combine the best compatable parts or compromises 
from the two models. The first step was covered in References 1 and 4. 
Reference 4 gives the complete algorithm, which was given only for the 
first pass of the data in Reference 1, and which is included herein as 
Appendix A. 
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II. DISCUSSION 


A. Description 

o 

Jaytoe's" unsupervised feature extraction process was developed 
for the analysis of -light data which has n spectral channels responding 
to each elemental area of the ground-scene which Is resolved in rectangular 
coordinates x and y. His method has four stages, which cne can describe 
briefly as follows: 

(1) A boundary map of the data Is produced by separating the 
data Into homogeneous and inhomogeneous areas. Each resolution element 
has a root mean square spectral difference s x or Sy with respect to 

the element which is adjacent to It in the x or y direction. Any 
element where s x or s^ is equal to or less than the average of such values 
for all of the elements in the scene is classified as a homogeneous ele- 
ment; otherwise, the element is classified as a boundary. A digital 
iaiage of a boundary map is recorded on magnetic tape for use in the next 
stage of processing. See Section II. B. 1 for comments. 

(2) The second stage is concerned with the selection and 
spatial merging of unknown candidate features based upon the homogeneity 
of the ground scene, as displayed by the boundary map which was recorded 
on magnetic tape in the first stage. See Section II. B. 2 for consents. 

(3) The third stage of processing is concerned with spectral 
merging of the selected unknown candidate features. In this stage the 
decision, to merge or not to merge, is based entirely upon spectral 
information rather than the spatial information which was used in the 
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second stage. The boundary and cluster map tape gives the locations of 
the raw data on the raw data tape belonging to each cluster. The mean 
feature vectors and covariance matrices are calculated for each cluster. 
"These calculations are used to define decision boundaries with which to 
physically surround the data belonging to a cluster in n-dimenslonal 
space. The most general closed surface that can be used to surround 
the n-dimensional data is an n-dimensiooal hyperellipse. The centroid 
of the cluster ellipse is given by the feature vector mean values 

..." (quoting Jayroe ). One makes "... a rotation, E^, followed 
by a diagonal transformation, ..." "Thus, the equation of an 
n-dimensional ellipse in reduced form is obtained for each cluster, and, 
in general, each cluster will have a different coordinate system. The 
next step is to give a decision rule for determining how many clusters 
actually represent the same feature... The decision rule is that two 
clusters represent the same feature if the centroids of both clusters 
are contained in both clusters' ellipses." See Section II. 8. 3 for 
comnents and analysis. 

2 

(4) Jayroe explains that the final stage of processing is 

concerned with classifying the data in the digital image of the ground 

scene and with showing the location and distribution of the features. 

The inputs to this scage of processing are the raw data tape, the 

statistics for each class, and the boundary tape. The decision rule which 
2 

Jayroe chose for classifying a resolution element into a given class, 
and the basis which he gave for it, are discussed with some analysis in 
Section II. B. 4 herein. 


3 





B* Analysis and Evaluation 
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1. Stage One: Boundary Mapping 

2 

Jayroe considers the equation of an ellipse in the 
(sx, Sy) plane. Involving quadratic and product terms. By using all 

of the resolution elements in the ground scene he finds the sample mean 

2 2 

values of s x , Sy , and s^s^. Those values are used to determine what 
transformation will align the coordinates with the principal axes and 
give the values of the semi-major and semi-minor axes of the particular 
ellipse which the sample mean values infer. That particular ellipse is 
then found in the (s x > s y) coordinates after the inverse transformation, 
from which the values of a, b, and c are determined when the sample mean 
ellipse is 

as x 2 + bs y 2 + cs x s y - 1. (1) 

One could then say, as Jayroe^ does, that the decision is to classify 
a resolution element as being homogeneous unless the left si**c of equa- 
tion (1) exceeds unity. Or, maybe one should say that the left side of 
equation (1) is a random variable such that the sample estimator of its 
mean is unity, and that the decision is to classify a resolution element 
as being homogeneous if 

as 2 + bs 2 + cs s < B (2) 

x y x y =* 

where B is an adjustable parameter. One could give B a higher value 
than unity as a trade-off against excessive computer time, up to some 
maximum value of B beyond which experience would show that boundary forma- 
tion would be dampened enough to reduce effectiveness materially. 
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2. Stage Two: Cluster Formation 


Whereas, the first stage identified each resolution 
element of the ground scene as being either a boundary element or a 
homogeneous element, it became of interest to see what is the smallest 
number of elements which a cluster or class could have. It seems that 
this depends on the second stage. Clusters of homogeneous elements are 
formed in the second stage, and the resulting clusters are merged into 
classes in the third stage. The fourth stage then classifies each 
element as belonging to or not belonging to the established clasoes. 

Thus, every cluster and every class has at least members; i.e. , no 
element can be classified unless it is sufficiently nearly like those 

which cluster in a homogeneous area which extends beyond a square array 

2 2 

containing p elements. Jayroe suggests 100 elements for the pxp 
array. In contrast to this the models by Su J and by Dalton * can 
classify any isolated element which is sufficiently nearly like any five 
other elements (which do not even have to be together). The resolution 
elements in the ERTS data are each about 79.2 meters (1/20 mile) x 57.2 
meters; i.e., in practical terms, a square field of less than about 5/8 
square kilometer (one quarter section) would not accept the 10 x 10 
array, and larger fields are usually not entirely homogeneous. 

3. Stage Three: Spectral Merging 

a. Decision Rule 

2 

In the computations for n spectral channels, Jayroe 
made transformations (a rotation F^ followed by a diagonal transformation 
Wg) to reduce to canonical form the covariance matrix for each of the 
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clusters of homogeneous ground scene elements. This may be computationally 

efficient; also, It facilitates theoretical derivations for decision 

rules because the transformed dimensions become statistically lndepedent. 

2 

Jayroe's analysis through his equation (26) is verified. Instead of 
Jayroe' s^ equation (28) for the inverse similarity S“* for the cluster 1 
and the vector v^ (which is the mean which k's transformed coordinate 
system gave for cluster k) as viewed within l's coordinate system, one 
gets 



c„ 
pp l 


when the c's are variances and where the term 2n, instead of just n, 
corrects for a term ^c^ which Jayroe inadvertently omitted from the 
bracketed factor in his equation (27). 

For the expected value of S~^ in this equation (3) 

2 

Jayroe just replaced the summation by n. That result would seem to be 
due to (1) an oversight in which the v^ may have been considered to 
represent an individual resolution element as a prospective member of the 
cluster whereas it is instead the mean of an entire cluster k, followed 
by (2) an assumption that the summation would have approximately a chi- 
square distribution with n degrees of freedom, and further (3) an assump- 
tion that the coefficient in equation (3) does not vary appreciably. 
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Attention will be given to the points mentioned about 
S * in equation (3). First, though, it may be recognized as an expedient 
departure from rigor in that the n-dimensional space for n spectral 

channels of data has for each cluster a separate transformation. Yet, 

2 

the practical objective which Jayroe pursues is to reduce cocqmtatiooal 

requirements by sufficiently nearly achieving statistical Independence 

between the n terms of the simulation in equation (3). This approach 

seems to be so nearly a characteristic to be proven by the results that 

it is retained as a constraint cn the present analysis. 

Dalton's* analysis considered tha:, when normal 

basic variables x,^ and x, have the same population mean, F . or 
picp*. P» Kt 

2 

t . . has an F distribution with one and M, + M - 2 degrees of freedom 
p, ki * 1 ° 

when (in the same coordinate system) and random samples of classes 

k and z have means x. and x . and variances c and c : 

p k p n pp k pp £ 


F * t 
P» k2, p,kf. 


( 4 ) 


+ M, - 2) ( x k - x p ) 2 
<*k + V <”k PP c k + ** PP c fc ) 


(5) 


because t has Student's t distribution with M + M - 2 degrees 

p, k t 

i (S) can be 

(p*k-p3) 


of freedom. Notice that F , „ in equation (5) can be written as 

P. i- . 2 


r P, H 






pp c k + M t pp c l 




/4 


( 6 ) 
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It/ l't*» 


4 


\ ' 

* 


J 


i 


Then, when the two clusters are of equal size and give equal estimators 
of variance ppC, the bracketed term in equation (6) identifies with the 
term being summed in equation (3) and is otherwise also an appropriate 
average for c for the two clusters. In a similar manner the term 2n, 

pp 

already mentioned in equation (3), came by replacing ( c. + n _c 0 )/ c c 

PF FP * pp 

by 2. Although equation (3) is already appropriate for use in the next 
stage to classify an individual prospective member of a class, symnetry 
seems tc require for the present stage (when two clusters are to be merged) 
that the bracketed term in equation (6) should replace the term being 
summed in equation (3); i.e.. 



2n 


*( 


Hk + M* - 2 > I 


K 


p-i 


P , w 


(7) 


The two factors in equation (7) are not statistically 

independent. However, they can be treated as statistically independent 

for the purpose of identifying parameter combination regions over which 

the relative variation of one of the factors is small relative to that 

2 

of the other factor. The expected value and variance a 

p, ki *p, ki, 

of each of the n terms F . . in equation (7) are, by Reference 5, 


\ + % - 2 

% Id “ + ^ " 4 ’ “k + h > 4 (8) 



n 2 

F p, ki 


2U t 


P p, ki 


(Oi l* : 3 \ 
\\ + ^ - 6 / , 


\ > 6 . 


(9) 
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Computations involving pairs of clusters can be made 

under the transformation peculiar to either cluster in the pair, but 

not both transformations together. A practical expedient would be to 

use only the transformation determined for the larger one of the two 

clusters. In that case, in equations such as (5) and (6) one would 

replace the variance of the smaller one by the variance of the larger 

one of the two clusters; e.g., see equation (20). Then, the statistical 

independence provided by the transformation gives the mean and variance 

of the surmation of the n terms as n times the respective values f^r the 

individual terms. Therefore, the expected value u T and variance cz, of 

l 2 t 2 

the bracketed factor T^ in equation (7) are 


U T 2 = 2n 


/ 5c +3, - 2 \ 

\\+*0 


8n(M^ 4 -2)0^ + Mg, - 3) 

h = \ + M l - 4) 2 + M £ - 6) ' 


( 10 ) 


(ID 


2/n 

Let T^ represent the coefficient factor in equation (7) in which each 


of the n independent standard deviations c 

PP 


1/2 


has an expected value 
n and a variance which, by Reference 5, can be taken as approximately 
a^/2M^. Then n products of such statistically independent factors has 
a mean which is 


M- Xl • (bj a ) 


( 12 ) 
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l&rUP*. 


t 


1' 

i 


Within the accuracy of a first order theory for the propagation of error, 
the variance of can be approximated by 


1 - 


( Z /2M„) 


(b n 7 ) 


nl. 



(13) 


But if the expected value of Tj is (b^ o) Q in equation (12), what can 
one best say might be the expected value of T^ 2 ^ n ? Of course it would 
depend on the distribution function. With ERTS data, tie value of n is 
4; so one wants the expected value of a square root. Also, one knows 
that the result is the geometric mean of the variances of the 

basic variables. Therefore, it may be sufficiently accurate to 
approximate it by the 2/n power of (bj a) , which is (b 1 a) . The 
variance of T^ a , by a further application of first order theory of 
error propagation, is 




2 T> ^/^/rp 2 

<; > n v “i. 



0 * 




(H) 


10 


v 


/ 



where values of b 1 as a function of M„ are tabulated in Reference 5 
(wherein b^ is called b(n) and M. is called n). 


9 2 

V/n~ b i " 

1 


(15) 


Thus, to t’. e extent that the two factors for S 1 
in equation (7) can be considered statistically independent, the expected 
valua of S 1 is 


V 1 = U T 2/a " T 2 


~ 2(b x a) 2 n 


/VV 

\\ + - 



(16) 


Then, by a further application of the first order approximation of 
error propagation, the variance of is, relatively. 


2 2 2 2 2 

' -! = ^To C 2/u + * 2/n 

S 1 2 T^ /a T x r 2 


_ 8b 1 a? 0^ + - 2) 

(^ + - 4) 2 


r “ k + M 1 - 2 
% 


+ b 


+ M. - 3’ 

l W + % - 6 


(17) 


Thus, the expected value and variance of S~1 are both proportional to n, 
but the ratio of the relative contributions to the variance of S * due 
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to the two factors in equation (7) is independent of n. When the 
two clusters k and i are the same size, then the second factor in 
equation (7) makes a contribution to the total variance which decreases 
from 45 percent for two clusters of size sever each, but remains very 
nearly a constant 1/3 of the total variance for any cluster size of 10 
or more. So, the assumption of statistical independence between the two 
factors in equation (7) would seem to be problematical for getting any 
accurate estimate of the expected value of S - *, etc., which Jayroe^ 
pursued in his equation (29). However, there seems to be no easy alter- 
native to choosing some reasonable approximation to a decision rule which 
would merge two clusters if the suranation in equation (7) would not 
exceed its expected value plus the product of some parameter C (which 
may be a constant or a function of n, see the last paragraph in this 
section) and the theoretical value of the standard deviation of the 
summation; i.e.. 


P=1 


P> W. 


£ nu 


'p, k& 


+ <¥ n 


P, k*. 


(18) 


where F is given by equation (5), »J.p is given by equation (8), 

P» kx. p, k£ 


and a, 


is the square root of the variance in equation (9). Then, 


P? U 

by substituting from the cited equations into equation (18) and 
rearranging the material, one gets 


(vHtil) ° 

v “k + % Vi 


- „V 2 


eeSl 

% 


ee^L 


^ n + C 


.( 




- 3 




+ M - 6 


1/2 

) /HT 


(19) 
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where, yet rigorously, the terns on the right side are the expected 
value and C standard deviations of the left side. One can now propose 
that, in equation (19), the factors involving the sums of the sample 
sizes might should be eliminated as a practical expecient. The elimi- 
nation of the factor on the left side would account for most of the error, 
would be of no practical consequence for large clusters; it would cause 
an error in the mean of only 4 percenf when the sum of the two clusters 
is 100, and this would not exceed 1/7 of the standard deviation when 
the number of channels n does not exceed 24. In practice, the given 
example is not intended to suggest any such size as a lower limit for 
cluster size; most clusters are larger than any permitted minimum size, 
and typically the combined size cf two clusters is considerably larger 
than twice any minimum size. Therefore, as a more practical expedient 
than the more rigorous equation (19) , an appropriate decision rule would 
seen to be that two clusters or classes k and l of sizes and should 
be combined into the same class when 

Y ■ ^n+C/2ir. 

p-1 pp^ + pp C £ 

«k 


Let the designations of the clusters k and l be such that ^ M^. 

Then, by the expedient which was discussed in the paragraph following 

equation (9), by replacing CL with C„ and using the transformation 

PP K pp x. 
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determined for cluster l in the summation on the left side, the decision 
rule becomes 


(Ml- ) i 

\\*«J p-i 


2 

( P \ ~ A) 

c n 

PP 4 


^ n + c far . 


(20) 


Jayroe's results, both in his equation (29) for 

the expected value of S and in his equation (30) decision rule for 

merging two clusters, would seem to require that the denominator in each 

2 

term in the simulation in equation (3) (Jayroe's equation (28)) would be 

the variances with respect to the means instead of the variances of 

the variables before they are averaged. Therefore, it would seem that 

2 

the criterion which Jayroe has in his equation (30) exceeds the expec- 
ted value of the indicated summation by a factor which would be approxi- 
mately half of the size of a cluster. One would expect that the model in 
that form might show a tendency to combine clusters excessively. 

.9 

Jayroe" notes that the decision rule, his equation 

(30), is a hyperellipse in the principal axis coordinates; that seems 

yet to be true with equation (20) . He says that the threshold in the 

decision rule (the right side of the equation) is independent of the 

cluster and depends only on the dimension n of the feature space; that 

o 

is true also in equation (20). However, Jayroe added: "Thus, if an 

elliptical boundary decision rule is used in the principal axis coordi- 
nate system, the theorem can be extended to say that the diagonal trans- 
formation is not needed and only the eigenvector transformation is 
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needed since the threshold can always be written as some constant times 

£the geometric mean of the n variances of cluster 1^," which is the 

coefficient factor Tj^ n in equations (3) and (7) with its expected 

value and variance approximated in equations (15) and (14), respectively. 

Jnfortunately, though, both the expected value and the variance of the 

2 

cited function depend on the unknown variance a of the population of 
which the given cluster is only a random sample of size M^. It is 
agreed that the diagonal transformation is not needed for computations; 
it does, however, show the origin and context of the equation (20) 

decision rule. The sample means and variances which are used in the 

2 

summation on the left side of equation (20) are given by Jayroe's 
equations (17) and (18), without the diagonal transformations; they 
do, though, presuppose that the computations are done in principal axis 
coordinates in order that the n terms in the summation in equation (20) 
are statistically independent. 

The principal axis of clusters which are random 
samples from the same population will have some distribution with respect 
to the p.^ocipal axis of the population. Thus, the principal axis for 
cluster k will generally be different from those for cluster & , and 
different from those which follow from combining the two clusters. It 
Is expected that it will be sufficiently accurate to use the computational 
expedient i Jch ignores the distinction cited because the hypothesis 
beint rested by the equation (20) decision tale its chat the two clusters 
*ve from the same population. 

Jayroe's model involves two determinations of his 
decision rule, his equation (30) , by reversing the roles of the two 
cluster because that equation is not symmetric with respect to the two 
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clusters k and The proposed revised decision rule, equation (20), 
will require only half as much computation because it computes a trans- 
formation for only the largest one of the two clusters. 

2 

In Jayroe's model, using his equation (30) as a 
decision rule for combining clusters or classes, it is not likely that 
a sufficient number of classes would tend to remain for some purposes. 
With equation (20), however, the number of clusters or classes which will 
remain distinct will depend on the value chosen for the adjustable para- 
meter C. Three considerations are evident: (1) all clusters which 

represent the same population class should be merged, (2) unlike classes 
should not be merged except, (3) when there are more statistically 
distinct classes than some upper limit which must be imposed as a compu- 
tational or other constraint, then further merging is necessary. The 
statistical significance of values of C, except for the smaller clusters, 
is illustrated approximately by: (1) a value of C of -(1 + u/100) 

would combine all pairs of clusters which show less than a 10 percent 
confidence level of being from different populations, (2) a value of 
C of - (2/3) //2n would combine no clusters which differ by more than a 
50 percent confidence level, and (3) a value of C of 4/3 would combine 
all clusters except those which show at least a 90 percent confidence 
of distinct populations. 

b. Order of Merging 

In Jayroe's model the distance between cluster 
centers was not considered in choosing which pair of clusters should 
be tested for merging. It would seem that the order of merging would 
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effect the quality of the results. Some of the computation time which 
is saved by using equation (20) can well be expended toward this improve- 
ment. When, in the course of the analysis, there remain K clusters or 
classes then there corresponds a square symmetric matrix of center 
separation values which upon being ranked have some smallest value, 
possibly repeated. The corresponding pair of clusters k and i should be 
tested by equation (20) to see if they should be merged. If they are 
merged, then in the matrix columns k and 2 and rows k and i are deleted 
and are replaced by one new row and column. The smallest value is again 
sought, etc. But if clusters k and i are not merged, then their element 
in the matrix is replaced by a number larger than the largest element 
before proceeding, and another matrix of uncombinable pairs is begun 
whose elements are the values of the left side of equation (20), etc. 
Another decision rule will be needed so that when the distance between 
centers exceeds a certain value the equation (20) test will be skipped 


E (p x k " p x S>) 
P-1 


E n . 



Then, if the number of remaining clusters or classes exceeds the maxi- 
mum allowable number, any further reduction is made by using the matrix 
of computed values of the left side of equation (20), so far as it had 
been used; the smallest element would identify the pair to be merged 
even though they qualified as distinct classes. 

4. Stage Four; Glassification 

The decision rule which Jayroe^ uses, his equation (31), 
for deciding when an individual element can be added to a particular 
class in the final classification is that the summation in equation (3) 

17 
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must not exceed 2n. The explanation that the factor 2 is used because 

the exponent in a normal distribution is divided by that factor is not 

convincing. As he says, though, it does seem appropriate to use a less 

restrictive criterion than that which would be right for deciding about 

merging two clusters. Actually, considering that the mean and variance 

are approximately n and 2n for large clusters, the given criterion 

1/2 

amounts to adding (n/2) ' standard deviations to the mean, which, with 

2 

the 12-channel data reported , would classify an individual resolution 
element into a class unless its difference is significant at a 98 percent 
confidence level. With 4-channel data, as in ERTS, the confidence would 
be 90 percent instead of 98 percent with the same decision rule, Jayroe's 
equation (31). 

It seems prudent to derive more rigorously a decision 

rule which does not presuppose large clusters for classifying individual 

resolution elements into established clusters. For this purpose the 

presupposition of normal variables in principal axis coordinate systems 

will be continued, and the same notation as in equation (3) except that 

v is the coordinate of an individual resolution element instead of the 
P 

sample mean of a cluster. Dalton showed that 


(2s_l2\ ° <p y V >-> 2 

Vm * +1 / P -1 pp C e 


p> % 


( 1 , \ - 1 ) 


( 21 ) 


where each of the n terms on the right has an F distribution with one 
and M -1 degrees of freedom, and they are statistically Independent due 
to the principal axes coordinates. Then, for ^ 6 this gives, in the 
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form of a decision rule as a function less than or equal to its expected 
value plus the product of some parameter D and the standard deviation, 


“ y - p*t > 2 

' M 1 + 1 ^ pml Pp C i 



5. Further Passes 

Jayroe^ explains that his program has the capability, 
when the size of the pxp array for cluster selection has caused incom- 
plete classification of the ground scene, to reduce the size of the array 
in order to search for further clusters and to make a further classifica- 
tion of the data. It would appear that this is better than using a smaller 
array in the first place. This is because, as Jayroe^ says, "The fixed- 
shape array, if chosen large enough, will not permit the mixing of features 
because the open gaps in the boundaries will be so small compared to 

the array size that the array will not be able to pass through the 

2 

boundary." Jayroe's statement about the 10 x 10 array, that the mini- 
mum sample size which it provides (100) is very adequate for statistical 
calculations, inadvertently may give the impression that a smaller array 
would give a sample size which would statistically not be adequate for 
the determination of (1) whether or not two such clusters should be 
merged or (2) whether or not the class which it might represent should 
also contain a particular individual resolution element which is to be 
classified. However, equations (21) and (22) are based on the F distri- 
butions instead of the chi-square distribution, the statistical require- 
ment for which is that the clusters must not have less than 6 members. 
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In either case the basic variable ir. presupposed to be distributed 
approximately normally. The statistical reason that larger clusters are 
needed when one does not use the F distributions is that, in order to use 
the chi-square distribution for the summation on the left sides of 
equations (21) and (22), the indicated means and variances must be pre- 
supposed to be identical with those of the (unknown) population of which 
the cluster is a random sample of size M^. 

C. Conclusions and Reconmeadations 

Jayroe's^ decision criterion to classify a resolution element 
as being homogeneous is (his equation (16)) 

as x 2 + bs y 2 + c \ s y < 1 (1) 

and that otherwise the element is a boundary. It seems likely that the 
criterion could be improved by writing it as 

as x 2 + bs y 2 + cs x s y £ B (2) 

and experimentally checking whether some other value of B in the vicinity 

of 1 might give a model which would have a better balance between 

effectiveness and computation requirement. 

2 

The decision rule which Jayroe uses to see if clusters k 
and i should be merged, when the clusters have individual transformations 
giving statistical independence of their n-channel spectral data, is 
(his equation (30)) 
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1 


T <E v k - p*»r 

P*1 pp C „ 


provided that the equation is also satisfied when the roles of k and £. 
are reversed. Instead, a better decision rule would seem to be, where 
k and i are such that :> 


\ t (p X k " p*&) < n + C/2TT 

l k + Mji / p »l pp c j. 


where and are the sample sizes of class (or cluster) k and £, and 
where C is the number of standard deviations from the expected value of 
the left side. Although the best value for C might be in the vicinity 
of zero, some experiments with data might show a better value. The 
clusters with the closest centers should be tested for merging before 
testing more distant clusters. 

o 

The decision rule which Jayroe uses to see if an individual 

2 

resolution element should be added to a class £ is (J'-yroe’s equation 
( 31 )) 


v - x„ 


Instead, a better decision rule would seem to be 


/ “t - 3 \ “ c P v V ,) 2 / % - 2 \ 

( ) £ £ — < n + IVZni _) (22) 

' M# + 1 ' P“1 PD C o ” ' M o “ 5 ' 
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where D is the number of standard deviations from the expected value of 

the left side. Because his sample sizes were sufficiently large for the 

2 1/2 
purpose, Jayroe’s choice in equation (24) corresponds to adding (n/2) 

standard deviations to expected value n, and he was using 12 channels of 
data. This would imply a value for D of /6, but some further experiments 
with equation (5) might show a better value for D. 

Some experimental effort is needed to establish the best combi- 
nation of values of B, C, D, and p, where p is the size of the pxp array 
which determines the minimum size of a cluster and where B, C, and D are 
the model parameters in equations (2) , (20) , and (22) . 
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APPENDIX A: Reference 4 


tel 
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NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 

GEORGE C. MARSHALL SPACE FLIGHT CENTER 
Marshall Space Flight Center Al«!>*.'a 35312 


REPLY TO 

ATTN OF: S&E-AERO-YF-3-73 


September 19, 1973 


TO: S&E-COMP-RRV/Mr. Jack A. Jones 

FROM: S&E-AERO-YF/Mr. Charles C. Dalton 

SUBJECT: Request for Program of Algorithm from NASA TMX-64762 and 

MSFC Memo S&E-ALRO-YF-2-73 on Account No. 177-32-71 (Task 
Agreement J99) 

The subject report and memo which were recently given by me offer a 
method for non-supervised classification and mapping of remote sensing 
multlspectral data. A program, which please have prepared, will enable 
us to study the computational performance and efficiency of that method 
vis a vis our other methods. The subject algorithm, in somewhat further 
desired detail, is as follows: 

ALGORITHM FOR UNSUPERVISED CLASSIFICATION USING F DISTRIBUTIONS 

For each class or prospective class one needs values for the follow- 
ing parameters: 

m • number of members ir. the class 


x^ ■ — L » class mean in each channel k ■ 1, 2, ..., K 

m a»l 


2 i “ — 2 

■ — ■ E ( x lca~ * c ^ ass variance in each channel 
a-1 


" m S ^*ka“ *k^ X Ea“ pair ° f channels k 30(1 * 

m 


“p "(s^) 


A / 2 2 

V\ \ 


It •« •• M II It II 
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liU*t**< 


TF 


EF 


2 “f 



KU p 

where K is 

the : 


K-l 

K 

Ko f 

+ 2 E 

E 

r 

k 1 ’! 

Jl-k+1 


kV 


Also, for each pair of established classes i and j containing and 
members one needs values for the following parameters; 








ZF 


U 


(Oj+ m^- 2) / (m^+ 4) 


2p (m.+ m - 3)/(m + m - 6) 

F ij 1 J 1 j 

Vi ( V v 2) i ( \r \i > z 




°i + "j 


ii.i, (m + m.- 2) 
i j i 1 


k-l 2 . Z 

B iV Vkj 




( Vki + "j r ^ )(B l s £i + Vij> 


I 

r 


EF 


ij 


“ Ky. 




’IF 


ij 


Ko! 


K-l 
+ 2 E 


K 

r 


ij 


k-l f.*k+l 


^ki.ij 


ij 


(W 1J- % >/0 O t1 


\ 
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The two other formulas which are always med together, with a purpose 
which depends on what datum is substituted for the parameter x, , are 


EF 


and 

A 


Sr X <v V 2/s t 

I CEF-Uj-p) /a Efr | . 


(54) 


Preliminary step. Is this a re-start? No: go to step 1. Yes: 

go to st' 25. 

Step 1. Read control parameters A , A, , M(> 6) , W , A_, P, and 
A o 1 — max r 

V 

Step 2. Read the first M samples. 

Step 3. Calculate parameters for prospective class. 

— 2 

Step 4. With the x^, s^, etc., from step 3, calculate a value of 

A In equation (54) for each of the M samples by using the values of x^ 
for that particular sample ia equation (54) with the minus sign. Does 
the largest value of A satisfy A < A ? Yes: go to step 7. No: go to 

step 5. ° 

Step 5. Discard the first sample accumulated. 

Step 6. Read a new sample, then go to step 3. 

Step 7. Designate a new class having the parameters extant, 
including the class mean of the sample values of A, say ~K. 

Step 8. Does the program reach the end of the sample? Yes: go 
to step 19. No: go to step 9. 

Step 9. Does the number of classes W satisfy W _< W ? Yes: go 

to step 12. No: go to step xO. ma * 

Step 10. Calculate class-pair parameters A., for all combinations 
of classes in pairs. J 
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Step 11. Combine the tvo classes 1 and j which give the smallest 
pair-parameter and compute the single-class parameters for the 

resulting class, including A, etc. Go to step 12. 

Step 12. Read a new sample. 

Step 13. By using the values of from the new sample in equation 

(54) with the plus sign, claculate a value cf A for each of the W established 
classes according to their given values of m, x^, s£, y^, and o|p. Does 

the smallest one of the m values of A satisfy A < A^? Yes: add the sample 

to that class, revise the parameters of that class and go to step 8. No: 
put the sample in hold and go to step 14. 

Step 14. Has the number of samples in hold reached M? No: go to 

step 12. Yes: go to step 15. 

Step 15. Calculate parameters for prospective class. 

— 2 

Step 16. With x. , s^, etc. from step 15, calculate a value of A 
In equation (54) for each of the M samples by using the values of x, 
for the particular sample in equation (54) with the minus sign. Does 
the largest value of A satisfy A < A ? Yes: go to step 17. No: 

discard the first one of the M samples held for step 15 and go to step 12. 

Step 17. Designate a new class with the parameter values which are 
extant (from step 15) and the mean A of the sample values of A. 

Step 18. Qnpty the hold from step 14 and go to step 8. 

Step 19. Subtract one from the value retained for the P parameter 
and retain the new value. Is the result less than one? Yes: go to 

step 20. No: go to step 25. 

Step 20. Is the smallest A., less than A_? Yes: go to step 21. 

No: go to step 22. J 

Step 21. Combine the classes 1 ard j, compute the parameters 
(including T) of the resulting class k and the parameters Aj. relating it 
to each other class 1, and go to step 20. ** 

Step 22. Prepare a print-out/read-in tape with re-start versatility. 




Step 23. Print out the classification map, 
and for each class the parameters n, x^, s^. 


the class pair parameters 
A w , and "K including all 


channels k and pairs of channels k and Z. Identify the print-out. 


Step 24. Stop. 


Step 25. Is this a re-start no? No: go to step 28. Yes: read 

revised control parameters W , Ap, P, and A_ and the re-start tape (of 
step 22) and go to step 26. max 

Step 26. Is W greater than W ? No: go to step 28. Yes: go to 

step 27. 

Step 27. Combine the pair of classes i and j , which correspond to 
the smallest , into a single class k, compute the class parameters and 

the which relate it to each other class Z, and go to step 26. 


Step 28. The extant membership of the establish classes resulting 
from the completed pass, the re-start after a prior classification or 
standardized pre-classification, give parameter values m, x^, s^, y^, 

and (for equation (54)) which retain throughout a new complete pass 

of the data (to be revised only at the end of the data pass) for a revised 
classification. 


Step 29. Read the upcoming sample of data in the most economical 
order (e.g., first, second, ...). 

Step 30. Use the values of x^ for the sample and the plus sign in 
equation (54) to calculate a value of A for each of the W classes. 

Step 31. Classify the sample by the class with the smallest A, 
which value remember (for step 35) . 

Step 32. Does the program reach the end of the sample sequence? 
Tes: go to step 33. No: go to step 29. 

Step 33. The W established classes now have new memberships but 
parameter values from the previous classification. Does the smallest 
class i now have less than six members? Yes: go to step 34. No: go 

to step 35. 
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Step 34. The smallest class 1 has some smallest identifying its 
closest neighbor class j. Is A^ less than A^? Yes: Combine classes i 
and j and go to step 33. No: hold class i for step 35 and go to step 33. 

Step 35. By the new memberships, revise the parameters for all classes 
with not less than six members and revise all for which both das is 
i and j are not less than six. Those classes with less than six members 
must retain a value of six for m for any next classification. Go to step 
19. 

Charles C. Dalton 
Flight Data Statistics Office 
Aerospace Environment Division 
Aero- As t r odynairi c s Laboratory 
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